REMARKS 

In the outstanding office action, claims 1-19 were presented for examination. Applicant 
notes the withdrawal of the previous grounds for rejection. All rejections made in the 
instant application are based on a new reference to Minematsu. It is respectfully 
submitted that this rejection should be withdrawn on account of basic differences 
between the method of the present invention and the apparatus and method taught by 
Minematsu. More particularly, the claims, as amended, clearly recite important 
differences between Minematsu and the present invention. 

While, clearly, Minematsu deals with the problems associated with mispronunciations, 
it does so with a database built only using mispronounced speech. 

In contrast, in accordance with the present invention, as claimed in only some of the 
claims, a database of properly pronounced English is assembled using a speaker of 
proper English. Likewise, in accordance with the invention, a database is generated in 
accordance with the present invention using a speaker or speakers who customarily 
mispronounce words. This alone supports patentability without any of the other 
limitations discussed in this amendment. 

Minematsu is keyed to word by word mispronunciations, as compared to word and 
phrase pronunciation errors. This alone supports patentability without any of the other 
limitations discussed in this amendment. Nor is this a theoretical point, as many 
mispronunciations occur in the slurring of the end of one word into the beginning of 
another word. Such issues are thoroughly treated in the Lessac book cited in the 
specification. 

In addition to the above schematic differences in the approach of Minematsu as 
compared to the methodology of the present invention, there is a fundamental aspect of 
difference between Minematsu and Lessac. In particular, there are two approaches to 
the problem of teaching a person to speak properly. The first, and overwhelmingly 
endorsed approach, in one segment of the academic world and in the scientific 
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community, is that of the International Phonetic Alphabet or the "IPA" as it is more 
commonly called. The essence of this approach is the concept that each sound in a 
language has a proper pronunciation and that language can be synthesized by an 
individual by having that individual listen to a proper pronunciation of the sound, hear 
his own proper or improper pronunciation and thus approach and achieve perfect 
pronunciation. 

In contrast, the Lessac approach is keyed to the feeling of sound and words with 
analogies to the sounds of musical instruments. While the Lessac approach has been 
rejected by the scientific community, it has many enthusiastic and prominent followers 
in the theatrical community. Accordingly, there is a strong bias against the use of 
Lessac's approach in scientific approaches to voice-oriented problems. 

Certainly, not all of the claims are limited to the limitations of using the Lessac 
approach. But those claims which have that limitation, and that limitation alone, have a 
content which militates strongly in favor of patentability. It is clear that in the 
recognition of speech phonemes, a system which is predicated on the proper 
pronunciation of a finite number of phonemes which can be individually treated to deal 
with voice-related computer tasks, is easily accepted by the scientific community. In 
contrast, it is easy to understand the scientific prejudice against use of the Lessac 
system. This renders application of Lessac non-obvious. 

In addition to the differences in the basic nature of the Lessac approach, implemented in 
accordance with certain aspects of the present invention as claimed in only some of the 
claims of the present invention, and the International Phonetic Alphabet approach of 
Minematsu, which clearly shows how the scientific community would have a prejudice 
against a Lessac approach, objective data conclusively shows the scientific prejudice. In 
particular, a search was conducted on the United States Patent and Trademark Office 
database, and 33 patents were found that reference to the "International Phonetic 
Alphabet". See Exhibit A. 

On the other hand, when the same search was conducted on the United States Patent 
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and Trademark Office database for patents including the term "Lessac", not a single 
patent was uncovered. See Exhibit B. The evidence is clear that there would be a strong 
prejudice in technological pursuits against substitution of Lessac techniques for 
International Phonetic Alphabet approaches. This alone supports patentability without 
any of the other limitations discussed in this amendment. 

Still other aspects of the invention of the present application, standing alone support 
patentability, and independent claims, focused on these and the above issues have been 
submitted with this amendment. 

Also in accordance with the present invention, a user is presented with an interactive 
training program in response to the detection of repeated instances or a reliable single 
instance of pronunciation error. This is not taught by Minematsu. Here again, this 
alone supports patentability without any of the other limitations discussed in this 
amendment. 

In addition to the above, in accordance with the present invention, speech training is 
achieved in the course of a speech to text recognition processes. This is not taught by 
Minematsu. 

Turning to the substance of the claims, processing of words and phrases is addressed in 
claims 1 and 19. As noted above, this allows the system to address errors in word 
combinations, something not addressed by Minematsu who is limited to dealing with, 
for example, simple mispronunciations of words by people having a particular 
characteristic accent, such as Japanese. Accordingly, it is believed that claims 1 and 19 
are clearly allowable. 

Claims 3, 5, 11, 17 and 24 recite the presentation of the option of receiving speech 
training to a user. As the any outstanding on section, the same is not taught by 
Minematsu. Accordingly it is believed these claims are in condition for allowance. 

Claims 9, 10, 14, 15, 16, 18 and 22 or relate to various aspects of the Lessac approach and, 
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accordingly, are believed to be patentable. 



Claims 12, 25 and 26 deal with the sensitivity of error detection for the threshold 
algorithm, something which is not remotely taught by the prior art, and which, 
accordingly, is believed to render these claims allowable over the of record. 

Claims 21 and 27 deal with a method which incorporates the development of a database 
using speakers who speaks properly and other speakers who do not pronounce words 
properly. This is not remotely taught to the prior art and is also believed to be clearly 
patentable subject matter. As noted above, this is something which is not remotely 
taught by the part of record and, accordingly, this clearly renders the subject matter of 
these claims patentable over Minematsu. 

In view of the above amendments and the discussion relating thereto, it is 
respectfully submitted that the instant application, is in condition for allowance. Such 
action is most earnestly solicited. If for any reason the Examiner feels that consultation 
with Applicant's attorney would be helpful in the advancement of the prosecution, he is 
invited to call the telephone number below for an interview. 



Respectfully submitted, 

By: jmJu 

Anthony H. Handal 
Reg. No. 26,275 
Roger Pitt 
Reg. No. 46,996 

HANDAL & MOROFSKY 
80 Washington Street 
Norwalk, CT 06854 
(203)838-8589 

I hereby certifyJhat this correspondence is being deposited with the United States Postal Service as first class mail postage prepaid, 
in arA^elopjf/ddressed to: Assistant Commissioner for Patents, Washington, D C. 20231, on April 15, 2002 



Anthony H. Fftndal/ Roger Pitt 
Reg. No. 26,275/46,9% 



-13- 




"VERSION OF AMENDED CLAIMS WITH MARKINGS TO SHOW CHANGES 

MADE" 

1. (amended) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a [first] computing device coupled to said microphone, said 
computing device having a program with a database comprising [consisting of] (i) 
digital representations of known audible sounds corresponding to proper 
pronunciations of words and phrases and associated alphanumeric representations of 
said known audible sounds corresponding to proper pronunciations of words and 
phrases and (ii) digital representations of known audible sounds corresponding to 
mispronunciations [resulting from] associated with known [classes of] mispronounced 
words and phrases, comprising the steps of: 

(a) receiving said audible sounds in the form of [the] an electrical output of said 
microphone; 

(b) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine which of said known 
audible sounds is most likely to be the particular audible sound being compared to the 
sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(e) receiving an error indication from said user indicating that there is an error in 
recognition; 

(f) receiving from said user an indication of the proper alphanumeric 
representations of said particular audible sound; 

(g) determining whether said error is a result of a known type or instance of 
mispronunciation; and 

(h) in response to a determination of error corresponding to a known type or 
instance of mispronunciation, presenting an interactive training program from said 
computing device [computer] to said user to enable said user to correct such 
mispronunciation. 
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2. A method of speech recognition as in claim 1, wherein said interactive training 
program comprises playback of the properly pronounced sound from a database of 
recorded sounds corresponding to proper pronunciations of said mispronunciations 
resulting from said known classes of mispronounced words and phrases. 

3. A method of speech recognition as in claim 2, wherein the user is given the option of 
receiving speech training or training the program to recognize the user's speech pattern. 

4. A method of speech recognition as in claim 3, wherein said determination of 
whether said error is a result of a known type or instance of mispronunciation is 
performed by comparing the mispronunciation to said digital representations of known 
audible sounds corresponding to mispronunciations resulting from known classes of 
mispronounced words and phrases using a speech recognition engine. 

5. A method of speech recognition as in claim 1, wherein the user is given the option of 
receiving speech training or training the program to recognize the user's speech pattern. 

6. A method of speech recognition as in claim 1, wherein said determination of whether 
said error is a result of a known type or instance of mispronunciation is performed by 
comparing the mispronunciation to said digital representations of known audible 
sounds corresponding to mispronunciations resulting from known classes of 
mispronounced words and phrases using a speech recognition engine. 

7. (amended) A method of speech recognition as in claim 1, wherein said database 
consisting of (i) digital representations of known audible sounds and associated 
alphanumeric representations of said known audible sounds and (ii) digital 
representations of known audible sounds corresponding to mispronunciations resulting 
from known classes of mispronounced words and phrases, is generated by the steps of 
speaking and digitizing on a second computer said known audible sounds and said 
known audible sounds corresponding to mispronunciations resulting from known 
classes of mispronounced words and phrases, and transferring the same to said first 
computing device. 
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8. (cancelled) A method of speech recognition as in claim 7, wherein said database has 
been introduced into said computing device after said generation by speaking and 
digitizing has been done on another computing device and transferred together with 
voice recognition and error correcting subroutines to first computing device. 

9. A method of speech recognition as in claim 1, wherein said interactive program 
instructs the user using Lessac System techniques. 

10. A method of speech recognition as in claim 1, wherein, said interactive program 
instructing the user in the correct pronunciation of said sound in terms of the sound of a 
musical instrument. 

11. A method of speech recognition as in claim 1, wherein said presenting an interactive 
training program from said computer to said user to enable said user to correct such 
mispronunciation is optional and is performed when elected by the user. 

12. (amended) A method of speech recognition as in claim 1, wherein said user is 
presented with an interactive training program in response to the detection of repeated 
instances or a reliable single instance [or] of pronunciation error [s]. 

13. A method of speech recognition as in claim 1, wherein said user is given the option 
of correcting said digital representations of known audible sounds. 

14. (amended) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a first computing device having a program with a database 
consisting of (i) digital representations of known audible sounds and associated 
alphanumeric representations of said known audible sounds and (ii) digital 
representations of known audible sounds corresponding to mispronunciations resulting 
from known classes of mispronounced words and phrases, comprising the steps of: 

(a) receiving said audible sounds in the form of the electrical output of said 
microphone; 

(b) converting a particular audible sound into a digital representation of said 
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audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine which said known 
audible sounds is most likely to be the particular audible sound being compared to the 
sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(e) determining whether there is an error in pronunciation to generate an error 
indication indicating that there is an error in recognition; 

(f) determining whether said error is a result of a known type or instance of 
mispronunciation in response to the detection of repeated instances or a reliable sing le 
instance of mispronounciation; and 

(g) in response to a determination of error corresponding to a known type or 
instance of mispronunciation, presenting an interactive training program from said 
computer to said user to enable said user to correct such mispronunciation in 
accordance with Lessac techniques . 

15. A method of speech recognition as in claim 14, said interactive program instructing 
the user in the correct pronunciation of said sound in terms of the sound of a musical 
instrument. 

16. A method of speech recognition as in claim 14, wherein said interactive program 
instructs the user using Lessac System techniques. 

17. A method of speech recognition as in claim 14, wherein said presenting an 
interactive training program from said computer to said user to enable said user to 
correct such mispronunciation is optional and is performed when elected by the user. 

18. A method of speech recognition as in claim 14, said interactive program instructing 
the user in the correct pronunciation of said sound in terms of the sound of a musical 
instrument. 
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19. (amended) A method of speech recognition [as in claim 14, wherein said user is 
given the option of correcting said digital representations of known audible sounds.] 
using a microphone to receive audible sounds input by a user into a computing device 
coupled to said microphone, said computing device having a program with a database 
comprising (i) digital representations of known audible sounds corresponding to proper 
pronunciations of words and phrases and associated alphanumeric representations of 
said known audible sounds corresponding to proper pronunciations of words and 
phrases and (ii) digital representations of known audible sounds corresponding to 
mispronunciations, comprising the steps of: 

(a) receiving said audible sounds in the form of an electrical output of said 
microphone; 

(b) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
di gital representations of said known audible sounds to determine a match with the one 
of said known audible sounds most likely to be the particular audible sound being 
compared to the sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(e) outputting an error indication in response to a match with a known audible 
sound corresponding to a known mispronunciation; and 

(f) in response to a determination of error corresponding to a known 
mispronunciation, presenting an interactive training pro gram from said computing 
device to said user to enable said user to correct such mispronunciation. 

20. (new) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a computing device coupled to said microphone, said 
computing device having a program with a database comprising (i) digital 
representations of known audible sounds corresponding to proper pronunciations of 
phonemes and associated alphanumeric representations of said known audible sounds 
corresponding to proper pronunciations of phonemes and (ii) digital representations of 
known audible sounds corresponding to mispronunciations, comprising the steps of: 

(a) receiving said audible sounds in the form of an electrical output of said 




microphone; 

(b) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine a match with the one 
of said known audible sounds most likely to be the particular audible sound being 
compared to the sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(e) outputting an error indication in response to a match with a known audible 
sound corresponding to a known mispronunciation; and 

(f) in response to a determination of error corresponding to a known type or 
instance of mispronunciation, giving the user the option of receiving speech training or 
training said program to recognize the user's speech pattern; and 

(g) in response to exercise of said option, presenting an interactive training 
program from said computing device to said user to enable said user to correct such 
mispronunciation. 

21. (new) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a computing device coupled to said microphone, said 
computing device having a program with a database comprising (i) digital 
representations of known audible sounds corresponding to proper pronunciations of 
phonemes and associated alphanumeric representations of said known audible sounds 
corresponding to proper pronunciations of phonemes and (ii) digital representations of 
known audible sounds corresponding to mispronunciations, comprising the steps of: 

(a) forming a database by (i) having a person, who normally speaks said known 
audible sounds properly, speak said known audible sounds, and digitizing said known 
audible sounds spoken by said person who properly speaks said known audible 
sounds; and (ii) having a person who usually speaks said known audible sounds 
corresponding to mispronunciations and digitizing said known audible sounds spoken 
by said person who usually speaks said known audible sounds corresponding to 
mispronunciations; 



-19- 




(b) receiving said audible sounds in the form of an electrical output of said 
microphone receiving speech to be recognized; 

(c) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(d) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine a match with the one 
of said known audible sounds most likely to be the particular audible sound being 
compared to the sounds in said database; 

(e) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(f) outputting an error indication in response to a match with a known audible 
sound corresponding to a known mispronunciation; and 

(g) in response to a determination of error corresponding to a known 
mispronunciation/presenting an interactive training program from said computing 
device to said user to enable said user to correct such mispronunciation. 

22. (new) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a computing device coupled to said microphone, said 
computing device having a program with a database comprising (i) digital 
representations of known audible sounds corresponding to proper pronunciations of 
phonemes and associated alphanumeric representations of said known audible sounds 
corresponding to proper pronunciations of phonemes and (ii) digital representations of 
known audible sounds corresponding to mispronunciations, comprising the steps of: 

(a) receiving said audible sounds in the form of an electrical output of said 
microphone receiving speech to be recognized; 

(b) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine a match with the one 
of said known audible sounds most likely to be the particular audible sound being 
compared to the sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
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associated with said audible sound most likely to be said particular audible sound; 

(e) outputting an error indication in response to a match with a known audible 
sound corresponding to a known mispronunciation; and 

(f) in response to a determination of error corresponding to a known 
mispronunciation, presenting an interactive training program from said computing 
device to said user to enable said user to correct such mispronunciation using Lessac 
System techniques. 

23. (new) A method of speech recognition using a microphone to receive audible 
sounds input by a user into a computing device coupled to said microphone, said 
computing device having a program with a database comprising (i) digital 
representations of known audible sounds corresponding to proper pronunciations of 
phonemes and associated alphanumeric representations of said known audible sounds 
corresponding to proper pronunciations of phonemes and (ii) digital representations of 
known audible sounds corresponding to mispronunciations, comprising the steps of: 

(a) receiving said audible sounds in the form of an electrical output of said 
microphone receiving speech to be recognized; 

(b) converting said electrical output corresponding to a particular audible sound 
into a digital representation of said particular audible sound; 

(c) comparing said digital representation of said particular audible sound to said 
digital representations of said known audible sounds to determine a match with the one 
of said known audible sounds most likely to be the particular audible sound being 
compared to the sounds in said database; 

(d) outputting as a speech recognition output the alphanumeric representations 
associated with said audible sound most likely to be said particular audible sound; 

(e) outputting an error indication in response to a match with a known audible 
sound corresponding to a known mispronunciation; and 

(f) in response to the detection of repeated instances or a reliable single instance 
of pronunciation error, presenting an interactive training program from said computer 
to said user to enable said user to correct such mispronunciation. 

24. (new) A method of speech recognition as in claim 23, wherein said presenting an 
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interactive training program from said computer to said user to enable said user to 
correct such mispronunciation is optional and is performed when elected by the user. 

25. (new) A method of speech recognition as in claim 21, wherein said user is presented 
with an interactive training program in response to the detection of repeated instances 
or a reliable single instance of pronunciation error. 

26. (new) A method of speech recognition as in claim 22, wherein said user is presented 
with an interactive training program in response to the detection of repeated instances 
or a reliable single instance of pronunciation error. 

27. (new) A method of speech recognition as in claim 22, wherein said database 
comprising (i) digital representations of known audible sounds corresponding to proper 
pronunciations of phonemes and associated alphanumeric representations of said 
known audible sounds corresponding to proper pronunciations of phonemes and (ii) 
digital representations of known audible sounds corresponding to mispronunciations is 
formed by (i) having a person, who normally speaks said known audible sounds 
properly, speak said known audible sounds, and digitizing said known audible sounds 
spoken by said person who properly speaks said known audible sounds; and (ii) having 
a person who usually speaks said known audible sounds corresponding to 
mispronunciations and digitizing said known audible sounds spoken by said person 
who usually speaks said known audible sounds corresponding to mispronunciations. 
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